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1 . Introduction 



The aim of this chaper is to propose the use of a new extension of 
r.tnndar-d Item Rec.pcnsc Throx7 (IRT) moJdl:;;^ of cJiclioton.oL^ ilcir.j ic, 
include external variables. External variables may appear both as 
categorical grouping variables and as continuous variables. This 
requires the formulation of a model for the relationships between the 
external variables and the response items. Given the availability of 
.sufficiently rich data, such extensions can yield a more informative and 
powerful analysis of constructs and their measurements than what has 
so far been possible by standard IRT. 

To make the discussion concrete, we will illustrate the mpHiodolopy 
in the context of educational achievement test data, analyzing the eighth 
grode 1/5 sample from the Second International Mathematics Study, 
SIMS, Crosswhite, Dossey, Swafford, McKnight & Cooney (1985). The 
achievement tecting covered topics in algebra, measurement, geometry, 
and ariv.imetic. The responses to a set of algebra items administered at 
the end of the eighth grade will be related to a set of external variables 
in the form of background variables measured at the beginning of the 
eight grade, i he background variables include scores on mathematics 
tests, family background variables, information or> the student's attitude 
towards math, and type of math class attended in the eighth grade. This 
information will be brought together in a single model. The new general 
feature of this model is that it simultaneiously addresses four important 
icsues in item analysis: 

(i) EsLimacioM of IRT type item m.-^'jrvjivjrv.nit purjrnelorc. 

(ii) Assessment of the strengths of hypothesized antecedents to the 
student's latent trait level. 

(lii) Delectiori of item bias (differential item perform ance) . 

(iv) Testing and relaxation of the IRT requirements of 
unidimensionality and conditional independence. 

Wliile the major novelty is the inclusion of external variables, tfiere 
are several new specific features of the analyses to be presented. One 
feature is the relaxation of the conditional independence requirement for 
certain items that by virtue of the question format have an association 
that can not be described solely by their common dependence on the 
single trait. Another feature concerns the handling of items that have 
been deemed "biased", e.g. items that are sensitive to instructiongl 
coverage, but still contain valuable measurement information. Such 
items can be retained in the model by explicitly including parameters 
that describe the differential item performance. A third^ feature is thf. 
potential for explaining item bias by the influence of background 
variables. A fourth 
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feature is a stronger test of unidimensionality obtained by checking the 
homogeneity of the items in relation to the background variables, not 
only by considering inter item associations as is customary.. Finally, 
the modeling is capable of including several sets of items of differing 

To prepare for a discussion of the general modeling approach of 
Section 3 and the data analysis in Section 5, Section 2 briefly outlines 
relevant latent variable measurement modeling theory for dichotomous 
and continuous response variables. Section 3 outlines theory for the 
structural equation modeling that we propose for data ^f this kind. 
Section 4 describes the response items and a set of interesting 
additional variables that are available in the SIMS data. Section 5 uses 
this modeling approach to analyze the relationship betv/een some of the 
response variables of the SiMS data and a set of external variables. 
Section 6 concludes. 



The statistically less sopFiisticated reader may wish to skip sections 2 and 3 and 
go straight to the description of the data in section 4. Before doing so, such a 
reader may wish to note that the modeling framev/ork ^s given in Figure 1, where 
the relationships between the dichotomausly scored y's and the latent trait /] are 
described in an IRT fashion by two-parameter normal ogive itemj characteristic 
curves, while the relationship between rj and the background variables of x is 
described by a standard linear regression (although values for rj need not be 
estimated to obtain these regression coefficients). 

2- i^rstent Vaiieblp Merisurement Mnrlf-lin^^ 



Let U3 consider dichotonrious and continuous response variable inodels . 
Assume a vector of p continuous latent response variables that 
follow a standard linear measurement model in each of g groups of 
students (th--^ student subscript i and the group siibscri{>t will be deleted].. 



y^-u+Aq+e (1) 

where rj is the latent variable vecto"^, c- is the vector cf measurement 

errors, and A are intercept and slope (loading) measurement 
parameters, so tfiat 



E { ) = ^ A K 



(2) 



V(y* ) = A >^ A + e (3) 

where k is the mean vector of /j , 4^ is the covariarice matrix of r], and 
© ir> the covariance matrix of the measurement errors, usually assumed to 
bo diagonal. 

When modeling dichotomous response variables we have fDr variable j 



yj J J 

0, otherwise 



When working with aggr-egates of items in the form of subscores or item 
pc"ircelG, we assume a continuous response variable, 



y -y"' (5) 

This is the standar^d confirmatory factor analysis measurement 
framework of Joreskog (1959), extended to a comparative multiple-j^roup 
an?ilysis in Joreskog (1971) and Sorbom (1974, 1978), extended to a 
multiplr^ factor dichotomous response model by Christ of fersson (1975), 
I.>.t'. . (19 78), or..i Eojk r. Aitkin (1931), oi J further cxlrn.lcd tj 
dicholomous multiple-proup analysis in Mutfien & Christ of fersson (1981). 
Fc^r hi) overview, see Mic'levy (1986). 



The generality of the above type covarlance/correlation structure 
frcjrr.ework makes it suitable for a wide ran^;e of analyses involving validity 
irv.i'j 'js, see Joreskog (1978) and for instance Bohrnstedt (1983). One 
sp.:cific: example concerns the analysis of multitraitmultimethod matrices by 
covariance structure methods; for a recent overvievy see Schmitt & Stults 
(198o), 



Let us consider- factor analytic modeling of achievement variables of 
the SIMS type Our interest may be in assessing the dimensionality and 
strength of relation between each observed variaole and the construct (s). The 
obsr^rved variables may represent the subscores for the different content 
areas of algebra, measurement, geometry, and arithmetic. The subscores 
may be broken down in suitable item parcels so that there ai^e several 
or).;fr:r-ved scores for each area. Wc may entertain the simplistic hypothesis 
of a four-factor structure, assur^Jng that tht^ responses within each content 
ar-ea are unldimensional and tha*. the correlations between 
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the scores from different areas can be fully explained by their dependencies 
on the correlated constructs. We nnay also study the measurement qualities 
and relationships among the constructs across subgroups of students. By 
multiple-grouf) approaches we may then test hypotheses of invariant 

l-v.. cw:i. i Ill I. £^1 ww.-v.>, 



= = ... = A^ = A (7) 

If (6) and (7) are true we may next want to test tFie structural hypotheses 



We may find that for different instructional exposure to the topics 
covered in the test items, invariance of or A }Tiay not hold for certain of 
the item parcel scores related to certain constructs , while for other scores 
measurement invariance may be found. As noted by Miller & Linn (1986), 
th-i inGtructionjl coverage jnay be assur.jod to afTojt ih:, coj.sti u.^t in cu:.c.tic;i 
homogeneously across a set of test iteins, so that bias does not exist at the 
item level. To further scrutinize such Issues of validity in educationnl 
achievement data, it is useful to be able to shift th^^ anrdysis from the score 
level down to the "micro" item level. Such an effort will be described 
br^low, although it should be kept in miirJ tFiat th:3 tcrchniques to be discussed 
are equally applicable on the aggr^egatcd continuous score level, 

3. A Structural Model 

Let y be as in (1) and let the vector of latent constructs follow the 
linear structural equation system 

f]^ a vBq^rx^^ , (10) 

where a is an intercept parameter vector, B is a matrix of slopes for 
regressions among the r]'s (the diagonal elements of B are zero and I - B 
3 - p.onsingular) , Pisa m^^trix of slopes for regressions of the r]'s on trie 
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set of q exogenous observed x variables, while ^ is vector of resid jals. 
With standard assumpticns it follc^^y's that 



E {y^'\x) =1/4- A(I-B)"^a + A(I- Bj'h x, (H) 

Vly'^'ix) - A(I ^ B)'^4'(I - B)'^^A + 0, (12) 

This model framework was described in Muthen (1983, 1984), whare it wa? 
pointed out that structural models with dichotomous, ordered categoricol, and 
continuous latent variable indicators could be fitted into the following thr-eo- 
part ;^tructure: 

part 1: = A*{ K^r - [ v + A ( I - B^a] }, (13) 
(mean/threshold/reduced-form regression intercept structure) 

part 2: = vec { AA{I - Bf^V }, (14) 

(reduced-form regression slope structure) 

pari 3: = K vec { A [ A (I ^ Bf^^V (I - B) ''^A ' + 0] A }. (15) 

(ccvariance/ccrrelotion/ixcluced-form rocidu^l correlation structure) 



Here.^ A represf^nt;; a diagonal matrix of sc-^iling factor-s related to th- 
covariance matrix V(y i x ) and the K matrices are designed to select 
VM'ic>'J3 elements. This model also eiico}npa^'.Sf=^3 the L.ISREL fornulation of 
Jor'eskog(1973., 1977) and Joreskog S Sorbom fl984). For an over^vic^w of 
the various types of mockling thiit are possible , see Muthen (1983). 

Thv: paramf^ters of the model are estimated by minimization of the generalized 
lead squares fitting furiction 

F 1/2 { 3 - a) V is - o) (16) 

9 9 9 

where s contains the sample quantities corresponding to o, a* - ( ' ^2 ' ^' 
and VV is an estimate of the asymptohc covariance matrix of s. Twice the F value 
at the minimum gives an approximation to a large-sample chi square test of mod?! 
fit to the restrictions imposed on a. I^arge sample standard errors of parameter 
f//.iimates are readily available. For technical details, see Muthen (1984). 
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5-1 Extending IRT to external variables: a MIMIC structural probit 
model 

Of particulcir interest in this paper is the formulation of a special 

r^^v of thn rjHnve f^ener.^l modeU n-unely n modol with n ringl*? corr.triict 
under lying a set of dichotomous items ( lettmg i> = 0 

y^^Xq ^ e (17) 



It is welM<nown that assuming a normal e that is independent of rj and 
has independent elements gives rise to the tv/o-parameter normal ogive 
model or Item Response Theory (IRT), see e.g Lord S Novick (1958). This 
specifies a probii regression of erjch y on r] . We will now extend this IRT 
model to include a set of regressors x, 



=: a f y' X + ^. (18) 



The model is schematically depicled in figure 1. 
The reduced-form solution for y is 

„ X a -f X X i X ^ f e (19) 



Jhzt reduced form regression inter'^-ept vector is X a, the reduced-fonn 
regression slope matrix is Xy' and has rank one, wi^ile the reduced-form 
residu-jl covariance matrix X i// X' 0 l^s a single factor- correlational 
structure. To standardize, we take V(y i x ) to have )jnit diagonal elemenls- 
VvV, will add the multivariate probit assuinption that y I x is multivariate 
n')j-mal. Note that this does not mean that v/e assume normality for the y^'s 
or for r;, but normality is merely retjuired for the residual ^ and for 6. The 
distribution of q and the v**s is actually to some extent generated by the x's. 

In its continuous response form , this is the traditional so called 
MIMIC (multiple indicators and multiple causes ) structural equation model 
described e.g. in Joreskog 8l Goldberger (1975); see also references therein. 
For dichotomous response variables, this type of modrrl has been studied in 
Muthen (1979, 1981, 1983, 1935), ond in Muthen & Speckart (1985). 
wliere it was termed a structurel probit model. 

A multiple group version of the MIMIC model with dicFiotomous 
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responses would seem to be particularly useful in analyzing the present set 
of achievement data, allowing a simultaneous analysis of sever al groups of 
students with respect to both measurement and structural properties in a 

: in^le frorrevA;rk, 

The generalized least squares estimator becomes computationally heavy 
with a large number of elements in a. Exceeding much beyond, say, 250 
elements gives rise to unreasonable computing demands both in terms of storage 
and time. While an unweighted least squares estimator , using W = I, presum?:ibly 
can handle at least twice this number, it would not give a chi square model test, 
nor would standard errors be provided- A simultaneous multiple-group analv'sis 
would normally involve all three parts of the model. However, in a single group 
analysis the and part of the model need only be used, since such a model 
does not impose restrictions on a^. With p denoting the number of y variables and 
q denoting the number of x variables, there are pq elements in and p(p-l)/2 
elements in cr^- While problems with p^5, q=JO and p-lO, q-rS could easily be 
handled by the generalized least squares estimator, p=15 would restrict q to less 
than 10. Larger models could be kindled by ignoring the restrictions imposed on 
the part, vyhich would use less information in the estimation but would give all 
the results needed. Here, p==20, q^^lO could be handled with somewhat heavy but 
not excessive computations. In the analyses of section 5, a single group analysis 
using 02 and was carried out with p=8 and q= 24 and a multiple-group analysis 
of two groups with p^8 and q=14. While the multiple-group analysis involved 
modest ':oinputing, the single group analysis, using 224 a elements, involved rather 
heavy bu' not excessive computing. Still, it is clear that the analyses proposed are 
best suited to the detailed scrutiny of a small set of items. 

4. The SIMS Data 

To illustrate the methodology in a realistic setting, vye will use data 
from the Second International Mathematics Study (Crot^swfnte, Dossey, 
Sv/affor-d, McKnight, and Cooney, 1985). We will be concerned with a 
subset of data from tPie population of U.S. eighth grade students 
enrolled in regular mathematics classes. A ncjtional pr-ob^^bihty sample 
of school districts was selected proportional to size; a probcibility 
sf^mple of schools were selected proportional to size within school 
district; and two classes were randomly selected within each school 
yielding a total of about 280 schools and about 7,000 students 
measured at the end of Spring 1982. 

The c^chievement test contained 180 items ir> the areas of arithmetic, 
algebra, geometry, probability and stotistics, and measurement 
distributed among five test forms. Each student responced to cB core 
test (40 items) and one of four randomly assigned rotated forms (34 or 
35 items). All items were presented in a five category multiple choice 
format. In Section 6 our analysis will not include probability and 
statistics and will only use the core items within the other areas, 8 
each for algebra, geometry, and masurement, and 16 for arithmeti:. In 
this chapter, the responses to the eight algebra items will be of 
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particular interest. 



The instructional coverage of algebra, and the mathematics 
curriculum in general, is rather varied for U.S. 13 year olds. Hence, 
to complement th.e ILeir) retspuribe infurmdlion for^ Ineo^j aigeiu^a iLe;ri^>, 
we will utilize a class-level variable which categorizes the mathematics 
classes into four types, basic or remedial arithmetic (REMEDIAL), 
general or typical mathematics (TYPICAL), pre-algebra or enriched 
(ENRICHED), and algebra (ALGEBRA). Furthermore, we will check the 
plausibility of our analyses by drawing from class-level, item-specific, 
information on teacher reports of opportunity to learn (OTL), where a 
student is regarded as having OTL if the teacher taught or reviewed the 
mathematics needed to answer the item correctly either during this year 
or prior school years. 

The responses to the S'.vIS items discussed abc)ve were collected at 
the end of the eighth grade. The achievement level obtained by Jr\e 
student on the various aspects of the mathematics content has at that 
point of time been influenced by factors such as the type and amount of 
instruction given during the school year, initial aptitude, motivation, 
and interest in the topic, and a variety of socio-demcgraphic and other 
variables. Regarding algebra achievement, the outcome should be 
strongly related to the type of class attended, since in the eighth grade 
the content of the algebra test would usually only be well covered in the 
enriched (prealgebra) or algebra classes. To a certain extent, selecrtion 
into such classes take:; place based on the student's seventh grade 
scholastic performance in mathematics, particularly the central topic of 
arithmetic. Thp j^^rtioiprition in ei<>hih grade a]p^r-^!)ra classes may have 
important cons-jquences sir")ce tiiis allows •:tudent5 to lake calcula; in 
high school, which in turn opens up possibilities to study science and 
mathematics** topics in colleges and universities (see also Kifer, 1984). 

Much could b^ learned if stud^int postlest performance could be related 
to the mathematics course taken and to student characteristics as they 
entered the course. With the SIMS data we are in the fortunate position 
of having available a set of such external measurements from the 
beginning of the eighth grade. Fall 1981 "pre-test" data was gathered 
for a large portion of the "post-test" students measured in the 
spring of 1982. We will use this additional data to study both the 
algebra posttest item responses and a set of external variables in the 
framework of a model that relates the posttest algebra achievement to 
pretest predictors. These additional pieces of background data will nov/ 
be briefly described. 

The pretest data were gathered in the same way as the posttest data. 
The new set of variables to be used in our model in addition to the 
posttest algebra items includes pretest scores on the core items of 
algebra, measur^ement, geometry, and arithmetic, measurennenls of 
father's and mother's education, father's occupation, ethnicity, gender, 
attitude meaurements describing the student's interest in more 
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education, how useful he or she thinks mathennatics knowledge will be, 
and his or her attraction to mathomatics, and finally information on 
class type- The nneasurennent and scoring of these backgr^ound variables 
i'> rjf^s-pihf'c] in Table 1. The ahbrevirihonr. of Tc^hlt.' 1 vyjn be u~ed 
from now on It is important to note that some of the variables were 
measured only at the posttest occasion, particularly MORED, USEFUL, 
ATTRACT- These three measures were taken from Delandshere (1986). 



Insert Table 1 about here 



The wording of the eight posttest algebra core items is given in Table 

2. 



Insert Table 2 about here 



The sample used for analysis is the match between post- and pre~lest 
students that have complete data on all variables except father's 
occupation. For this variable there was unfortunately a large portion of 
missing data and it was decided to retain such observationb by including 
missing data as a special category, in odditior; to the dummy coded 
catego) les Lov/, Middle, and HigFi. Trie analysis sample is, hov\eve:r, 
only d *jubset of the two pre-and post-test data sets and in order to 
judge the effects of the missing data. Table 3 gives descriptive 
statistics for r-el<-;vant variables from each of the three data fiets. For 
purposes of simplifying the analyses, the variables have all been 
transforiTied to a 0 - 1 range. The analysis sample h'-s somewhat higher 
means ihun the other samples both on variables thought to he positively 
correlated with achievement and on post- test algebra"^ performance. 
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Insert Table 3 about here 



Although not included directly in our- analysis in Section 6 , we will 
also utilize the item-specific OTL measurennents on the post-test 
algeb^ - i^ems in order to enhance our understanding of the analysis. 
The u panel of Table 4 gives the percent correct on each item 
broken down by class type, while the bottom panel gives the 
corresponding OTL means. 



Insert Table 4 about here 



5 Analysis of the SIMS FJdta by a Structural Model 

Let us now analyze the SIMS data using the modeling framework 
presented in sections 2 and 3. It may be noted that the proposed analyses 
can not be handled by present IRT software, nor by present structural 
equ^jtion modeling software, such as LISREL. The estimation and testing of 
the models to be presented was carried out by an e>:perimental version of the 
LiJCCM? comp'jLer' program (/\naly3is of Lutecir Structural CiL:jtior^c; by o 
Comprehend: ive Measurement model), developed by the author*. (The progrnm 
is now available to general users in an IBM mainframe version through 
Sriontific Software Inc., Chicago, 111. ( 317) 831-6296). This program 
provides limited information gererahzod least -squares eslimatio.i of th.e 
model parameters a:; they appear in the thi-ee-part structure of Section 3. 
Standard errors of estimates and a large-sample chi-square test of fit to ti'? 
res'aictions on the three model parts are also provided. 

We consider the MIMIC model of Figure 1 . The y vector of response 
items correspond to [he eight items of Table 4. The x vector of regressors 
consists of the 17 background variables given in Table 1: PREALG, 
PREMEAS, PREGEOM, PREARITH, FAED, MOLD MORED, USEFUL, 
ATTRACT, NONWHITE, REMEDIAL, ENRICHED, ALGEBRA, FEMALE, 
LOWOCC, HIGHOCC, MISSOCC, and seven interaction terms, between 
NONWHITE and ttie three class type dummies, between PREARITH and the 
class type dummies, and between NONWHITE and PREARITH. In a 
preliminary analysis we also included interactions between sex, PREALG, and 
th'^ class type dummies, but these were not found significant. The latent 
N'ariable construct, posttest algebra achievement as measured by the 
cor-e items, is viewed as an intervening variable in the regressions of the y's 
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on the x's. 



We have attempted to use a large set of regressor-s which 
<ji-.o CG|■ltali^J L;om>:; vai iables that may not liave a direct oubstuntive ir.nur:;\c'ci 
on the latent variable construct. This was done for tv/o reasons. One reason 
relates to the fact that our analvsis sample was obtained by "list-wise 
deletion" of incomplete cases where judging from Table 3 the missingness 
appeared to be somewhat selective. If the missingness on the y's can be 
largely predicted by the included x's, the bias that could potentially have 
resulted in the parameters of the regressions may be small (c.f. Marini, 
Olsen, SRubin, 1980) . A second reason is related to the fact that we will 
also study subgroups of students in certain class types, which will involve 
the analysis of selective samples. For instance, Kifer (1984) noted that 
whites are overrepresented in algebra courses, and also that ".. almost 2/3 
of the students in algebra classes liave pretest arithmetic scores in the top 
quarter of the distribution", while "...almost 2/3 of the students whose 
pretest arithmetic scores are in the top quarter are not in algebra classes." 
Hence, we have included various interaction terms among the x's involving 
Ethnicity, Class type, and pretest arithmetic score, again to reduce potential 
bias. Furthermore, Muthen (1986) found that in addition to pretest scores 
and dernogr-aphic variables class type membership was also strongly related 
to the attitude variables ATTRACT and MORED. 

Section 5.1 deals with certain weaknesses in the actual data analysis. The 
reader who merely wants to view the analyses as illustrations of the 
potential of the new type of niodeling may want to skip to section 5.2. 

5.1 Analysis caveats 



We may recognize some weakne:,ses in the forthcoming analyses related to 
the sampling, the temporal ordering of the variables, and tfie potential of 
measL.Tement cr'ror and omitted variables in the set of x's, problems which 
may cause bias in the regressions. F'^n of all, our analyses ignore the 
complications of stratified sampling ?i,u niultilevel, hierarchical 
obser-vations. Although we realize that these features may have non-negligible 
consequences, the propter methods for handling them are not available in this 
context. Second, the attitudinal measurements MORED, USEFUL, and 
A 1 TRACT were obtained only at the posttest occasion, causing a possible problem 
if attempting to view these regressors as both predictors of entrance into 
advanced eighth grade classes and posttest achievement. These scores 
presumably reflect attitudes built up both before and during the eighth grade, 
although they ar-e most likely not a direct . eflection of the posttest 
performance. Furthermore, the pretest scores are created from a small 
numbei- of items, giving rise to low reliability. Although the rotated for-T 
items could have boon used, this was avoided since it would have either 
involved equating of observed scores or using IRT techniques with sets of 
items many of which may have low validity at the pretest due to rather 
limited OTL. For the 16 pretest arithmetic items, an attempt was made to 
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avoid the influence of m^^^asurement error by instead using factor scores. 
These were obtained in tne form of estimated 0 values from a marginal 
mnximum likelihood estimation (see F3ock 8i Aitkin> 1981) of the 16 iterrs 
with a threepararrieter logistic model using the computer program BILOG ( 
Mislcvy & Bock, 1984 ). Although reduction of measurement error would 
have been even more desireable for the other subtests, which involve fewer 
items, it was judged that the small number of items and the heterogeneous 
OIL measures for these subtests might not yield reliable results by JRT 
methods. For algebra and measurement, one item eacfi was rejected as 
invalid in relation to the total 40 item score. This results in "favoring" the 
variable PREARITH in the search for influential regressors. However, it 
was thought to be important to try to measure this variable well since it may 
be viewed as a proxy for final seventh grade mathematics achievement, which 
is an important factor in deciding eighth grade curriculum. 

A further measurement flaw in^^ludes a 40 % missingness on Father's 
occupation. We should also note that the Ethnicity c tegory NONWHITE is a 
very heterogeneous group consisting of 741 students, broken down as 8 % 
American Indians, 41 % Blacks, 1/ % Chicane, 6 % Latin, 9 % Oriental, and 
19 % Other. In terms of omitted variables, parental income may be a 
predictor of class type but was not measured, and it would have been very 
valuable if more general ability measures had been available before entrance 
into the eigiith grade instead of merely fall pretest scores. Also, measures 
of reading comprehension and vocabuhry would have been of interest since 
they i^ight play a role in "word problems". 

Preliminary analyses v/cre carried out on the posttest response items in 
order to investigate the presence of guessing (or non-zero lower item 
characteristic curve asymptote) and/or violations of unidimensionality in the 
c^lgebra items. Marginal maximum likelihood estimation of the two and 
three parameter logistic IRT models was carried out in BILOG and 
irrudimensionahty was tested both via LISCOMP's limited information GLS 
procedure and via the full information estimation procedure of TESTFACT { 
V^ilson, , Wood,, & Gibbons, x984; see also Bock, Gibbons, & Muraki, 
i90vS), in both cases assuming zero lower asymptotes. While 
unidimensionality could not be rejected using these approaches, the likelihood 
ratio chi-square test of zero lower asymptotes obtained a value of 46 with 8 
degrees of freedom. Although the large sample size of 4,320 yields a strong 
power for rejection and lower asymptotes may not be well estimated from 
such small number of items, there seems to be a possibility of some nonzero 
asymptotes. The influence of this on our two-parameter model would 
presumably be a slight underestimation of the corresponding slope (loading) 
airJ a biasing of the threshold, while structural parameters may be relatively 
unchanged. Anticipating^ the analysis discussion below, it is interestirig to 
note that neither the difficult item i nor item 5 exhibits significant 
auymptotes, either when analyzing the 8 algebra items alone or together with 
the other core items in a 40 item analysis (39 item.s were actually used due 
to one flawed item). 
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5.2 A structural model for all s*-udents: Model 1 



In the first step of the analysis we will consider the strongest and mool 
restrictive nnodel, where achievennent is viewed as a unidimensional 
construct, so that a single latent variable intervenes in the ^"egressions of the 
y's on the x's, without any direct regression paths fronn x's to y's. Tliis 
model will be called Model L It should be noted that in this first step of 
the analysis, the categorical grouping variables of class type, g mder, and 
ethnicity are included as dummy coded variables among the set of x's. Our 
intention is to let the analysis of Model i, and modifications thereof, assist 
in generating ideas for subsequent simultaneous multiple-group analyses, 
where the grouping i? based on such categorical variables, and where a more 
detailed analysis is possible. For our Tirst analyses of the whole analysis 
sample of 4,320 students, the complete set of assumptions in Model I may 
not be entirely realistic, since we include al) the different types of eighth 
grade classes, while Table 4 clearly shows that percent correct and OTL 
varies greatly and in different patterns for different items over these 
clciSL-^es- Nevertheless, this may be a useful starting point for our analysis^ 



Model I is an overidenMfied model, which imposes 188 restrictions on the 
reduced form regression slopes and residual correlations. The standard JRT 
unidimensionality assumption with conditional independence contributes 20 
t *-L^>triction=^., sin^e 2^ reduced f^rm re'^id'ja! corr^^^lntion'^ .^n^ describe by 8 
parameters related to the measurement part. The concept of an intervening 
latent variable construct in the regressions of the y's on tFie x^s contnbiAcs 
remaining 168 restrictions, since 192 reduced form regression slopes 
aroi described by merely 24 structural regre.ssion slope parameters. Hence, 
:n terms of restrictions imposed, tPie content of the model is largely a rcifAjlt 
of using the external variables of x and imposing MIMIC restrictions on the 
regression slopes for y on x. Utilizing external variables in this way gives a 
more pov/erful assessment of measurement qualities for tiie y's 
tarn wonld be obt:^'.ned by cor^sidering responses to the y'r. alone as in 
standard IRT. 

'^Iie large-sample ohi-square test of fit to the 188 restrictions of Model I 
obtained a value of 68l. This represents a significantly misfitting model. 
However, given the power resulting from the large sample size of 4,320, the 
valup is in our opinion small enough to warrant attempts of modifying details 
of tfus first approximation rather than rejecting it in its entirety. 
Thr^oLighout, we will use the chi~square test results more as descriptive 
measures of overall fit for a sequence of models fitted to the same data than 
a,r. a rigorous hypothesis testing instrument. In terms of such a descriptive 
useage, some experience with structural models for dichotomous response 
d>itci lead us to judge as reasonable fit a chi-square to degrees of freedom 
iMtio scaled to a sarriple size of 2,000 that is less than say 1.5 ( this ratio 
is 1. 7 for Model I ). We know that there may be clear substantive reasons 
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for lark of fit in parts of Model I and we will not be satisfied with the model 
as il stands, but investigate the possible rea^.ons for n^.isfit in an attpmpt to 
arnve at a modified Model IL 

The fact that Model I is strongly over-identified offers the opportunity to 
check the appropriateness of the various assumptions involved and to relax 
some restrictions if judged necessary* This would not be possible in a 
straightforward multivariate regression of the y's on the x's, but is the result 
of our notion of a single latent construct- To aid in attempts to check the fit 
of the various restrictions, so called modification indices will be used, "ley 
ar-e similar to wfiat is provided in the LISREL structural equation modeling 
program (Joreskog & Sorbom, 1984). Such an index rcfl'^cts the expected 
improvement in fit if a restricted parameter, such as one set to zero, is 
allowed to be freely estimated. The indices to be used in this version of 
LISCOMP are not scaled to represent the chi-scjuare metric as in LISREL, 
but are merely the first -order derivatives of the parameters. It should be 
noted that the use of these modification indices as a data exploration device 
may be dangerous. The information from the various indices for a certain 
model can be misleading since they may be highly correlated , the 
ir^ )rmation really only pertains to freeing up one parameter at a time, the 
indices are only good approximations for models that are close to a we.l- 
fitting one, and we may capitalize on chance in our data. Below, we will try 
to use these indices with care in conjunction with substantive considerations. 



T\yj mo-]if]cr>tiori indic-^s for Model I r-ire ^^i^^en in Table 5 b^lo\^^. Th^:^ 
indices in the top part of the table gives infcWiOtion on which dire^ct pathi^ 
from x^s to y's may need to be freed from their restriction to zero. Tiiese 
p'jtlts correspond to the broken arrows of Figure 1. The indices in the 
bottom part c^f the table gives information on potential violations of the 
conditional independence assumption of zero correlations among the 
residuals. In this table, the first-order derivative jnodification indices have 
reversed signs so that the present sign describes the expected direction of 
change from zero in a parameter. The derivative values have also been 
divided by 10 and rounded. 



Insert Table 5 about here 



Scrutinizing Table 5 in conjunction with other substantive information will 
lead us to Model IL Let us only consiaer the three largest modification 
indices for Model I, marked by asterisks in Table 5. Starting with Item Ts 
index of 17 for the ALGEBRA class dummy (comparing to the category of 
Typical classes), we have an indication of a positive direct 
"effect" of membership in algeb^-a classes on the performance on Item 1 (c.f. 
Muthen, 1986). It should be kept in mind that this direct influence occurs 
over and above the influence of the latent achievement construct on Item 1 . 
This implies that students with the same algebra achievement level, but 
belonging to different class types, may perform differently on Item 1; 
Q algebra class membership gives an aavantage. Hence, we have a suggestion 

ERXC - "item bias", or rather Instructional sensitivity in item 1. This empirical 



^'ir 0,^1 ion rncikc^s fuibstantive sen'^^e "^/hm v/e consid-'^r our auvilHory 
information* This is the only one of the algebra items that deals explicitly 
with "solving for x". Table 4 shows that this is the hardest of the eight 
items, with a large difference in proportion correct between students of 
typical and algebra classes, and with the largest difference in OTL between 
typical and algebra classes* From Table 4 we see that Items 6 and 7 have 
somewhat similar features, but none of tPiese items exhibit large ALGEBRA 
modification indices in Table 4* It seeins as if in this set of items the lack 
of instructional coverage in typical classes has a particularly detrimental 
effect on the response to item 1 . 

The largest modification index for direct x to y paths in Table 5 occurs for 
Item 5 on the dummy variable FEMALE. This suggests a gender item bias. 
The negative sign would imply that, for give.^ achievement level. Females 
perform worse on Item 5 than Males. We may note that this item ii.volves a 
*'word problem" in a way the other items do not. This potential gender 
difference will be further analyzed below. The largest modification index in 
Table 5 occurs for a correlation between the measurement error of item 6 
and 8, suggesting a violation of the conditional independence for these two 
items in the form of a positive correlation. From the item wording of Table 
2 we do in fact note that both items, and none of the others, involve a direct 
translation of a word problem into a mathematical formula. Hence it is 
possible that the correlation may indicate the presence of a specific skill, in 
addition to the algebra achie-v eminent construct, required for such a translation. 



5.3 A structural model for all students: Model II 

I et us now free up the above three parameters that were fixed to zero in 
Model I and consider the modified Model II. This model obtained a chi-squaro 
value of 441,59 v/ith 185 degrees of freedom. The difference in chi-square 
from Model I is 240 v/ith 3 degrees of freedom. Given the sample size we 
regard this outcome as an indication of a reasonable overall fit in the major 
parts of the model, although further adjustments could be made. Some 
interesting details may be noted before we consider the estimates of Model 
II. First, in this case the freeing up one of the three parameters at a time 
would by use of the largest modification indices lead to the same final result, 
irrespective of the order in which this was done. Second, the major results 
in terms of general magnitude and significance of structural coefficients 
remain largely unchanged when going from Model I to Model II. Third, for 
Model II the modification index for PREARITH x ALG has been reduced to 
almost zero from the Model I value of 9, the Model I value of 10 for item 8 
on FEMALE has only been reduced to 8, the Model I value of 8 for item 3 on 
ENRICH remains the same, and the Model I value of -8 for item 5 on 
NONWHITE al GO remains the same. The remaining major modification 
ifidices now appear among the error correlations with a few values of about 
1 # 

The parameter estimates for Model II are given in Table 6, where the 
first part of the table gives measurement parameter results and the second 
par-t gives results on structural parameters. For the measurement part we 
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also give estimalod reliabilities for e;.iL:h iteiru 



Insert Table 5 about here 



The estimated reliabilities are i.i some cases rather low, although we 
must bear in mind that these ar^^ item level responses. Since item 1 and 5 
are directly related to both the latent construct to be measured and one of the 
regressors, these two items, in relation to the other items in the set, are 
not homogeneous with respect to the set of regressors (c.f, Muthen, 1985). 
Regc^rding the structural parameter estimates, we find expected strong, 
significant influences on achievement from PREARITH and PREALG, and the 
other pretest scores, but also from USEFUL, ALGEBRA, FEMALE, and 
HIGHOCC. The significance of the last three dummy variables implies that 
given other regressor values being equal, membe^rsFiip in advance'^ classes 
rather than typical ones, being female, and having a father in the .ligh 
occupation category rather than the middle one, are conditions associated 
with a higher level of algebra achievement as represented by the latent 
variable construct. 

In addition to th.,^, v/e find from tfif^ bottom of Table 6 tFidt for a given 
v.}b:^ of the oc:hicvr^rrorit constru::t, mcrnborohip in al^^-^'br^ -rlor.sc-;:: nnd beir^; 
femole, res;."?ecLively, is associated with a higher level of performance on 
iternl and a lower performance on item 5, r^espectively. From tlic estim^itc^d 
parameters and the sample mean vector arid covariance matrix for x, we may 
also calculate the mean and variance of the latent variable conslr^uct and tfie 
proportion of variation in this construct that is accounted for by the set of 
repressors. We obtained a mean of 2.20, a standard deviation of 0.87, and 
73 % of the variation was accounted for. Using tfie mean and standard 
deviation v/e can translate the measurement paranicler estimates to stapd'ird 
IRT a and b values on a 0,1 0 scale (see below in relation with Table 8). 

5.4 A simultaneous structural analysis of males and females 
in typical classes. 

In Muthen (1986)^ the above analysis is taken furUier by considering class 
type differences. Here, we will iristcad study in more detail the differences 
and similarities in measurement and structural parameters across gender, 
A simultaneous, two-group analysis will be carried out for studenti^ of typical 
clas'^es. In these models, 14 x variables from the original set remain af^er 
eliminating class type and gender related dummies. Table 7 gives 
descriptive statistics for 



Insert Table 7 about here 
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Ul; ^ ri;^reosor;j. We note ttVit Males iiave slipjiliy higriei rnecjns on 
variables associated with high achievsrnent, except for USEFUL- The 
pr-oportion correct for the posttest algebra items in typical classes were for 
Males: 0-14, 0-65, 0-50, 0.40. 0,47, 0-51, 0-37, 0.50, and for Females: 
0.14, 0.69, 0.50, 0.46, 0.38, 0.53, 0.35, 0.56. The OTL values are 
given in Table 4 and do not vary appreciably over gender. 

In the multiple-group analvsis the effect of gender can be studied in more 
detail than was possible in the single-group analysis of Niodel II. In Model 
II, gender differences were only caf)tured in the intercepts of the achievement 
and the latent response variable regressions. Although interaction terms 
between gender and other regresscrs in Model II could have been 
accommodated in the achievement construct relation, the dummy vanable 
approach would not for instance be able to handle gender differences in 
measurement slopes (loadings). Also, in a multiple-^roup analysis it is 
easier to separately deal with tests of invariance in the measurement and the 
structural part. 

In this analysis we will apply a multiple-group version of the FTgure 1 
MIMIC model. Since the same measurement instr^ument was used for the tvyo 
sexes, we will test the notion of invariance in the measurement thresholds 
and slopes (loadings) for the eight response items, allowing all other 
prirnmelers to differ across the two groups. Based on the previous analysis 
re,-'jU,-^. for all students, v/e v/iil however allow the threshold and slope- of 
item 5 to vary. As a b:ise-line model we will first consider a multiple grouc) 
analysis of males and females where no parameterr arc invariant, in order 
to az^esh the appropriateness of the MIMIC model itself. With 230 degrees 
of freedom, this resulted in a chi-square value of model fit of 366. This fit 
is judged as satisfactory . The total sample size is 2,417 broken down as 
1,150 males and 1,267 females. 



Tlie addition of invario'^co of measurement intercepts and slr)pes, except 
for item 5, resulted in a chi-square value of 381 with 248 degrees of 
freedom, yielding a non-significant chi-square increase of 15 with 12 
degrc>es of freedom compared to 'Tie base-line moc^L Also adding invariancre 
for item 5, however, resulted in a chi-square difference test value of 33 
v/ith 2 degrees of freedom. This strong rejection of the invariance notion 
for item 5 is in line with our single-group results for Model II in all class 
types The parameter estimates for the multiple-group model of invariant 
measurement thresholds and slopes, except for item 5, is given in Table 8. 



Insert Table 8 about here 
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Frcun Ihe inedsur-ernent part of 'uible L we f,f*n ilrri item 1 has the low^rit 
correlation w)th the latent achievement conatr^uct. This is in line with the 
low OTL value of 50 % in Table 4. For item 5, the gender difference in 
thresholds and loadings translates into( see Muthen & Christ of fersson, 1981, 
equations 28 and 29) a two-parameter normal ogive a (discrimination) and b 
(difficulty) value on a 0,1 O-metric of 0.81 ar.d 0.09 for males and 0.65 and 
O.Si for females. Hence, the male item characteristic curve is shifted to 
the left from the fe^i^ale curve and is steeper, thereby favoring males. The 
reason for this gender difference is, however, unclear. The availability of 
further external variables such as a reading comprehension test might 
possibly have been able to shed light on this matter (c.f. Muthen, 1985). 

Regarding tlie structural slopes, the results are rather similar to tho3e for 
all students in Model II of Table 6. In the present model the intercept 
difference in the structural relation for the latent variable construct is not 
significantly different from zero. However, estimating the construct mean 
froHi the estimated coefficients and the sample mean vector for the x's, we 
find a value of 1.81 for males while females obtain 1.88. This difference 
should be viewed in relation to the male standard deviation of 0.67 and the 
female standard deviation of 0.63. Although males seemed to have slightly 
higher means on important regressors in Table 7, females end up with ^ 
slightly higher pusttest achievement level. The proportion of variation in the 
construct accounted for by the x's is 66 % for males and 68 % for females. 

In ooriitiu:: to iinp.;sir^.^ re^Ai ict iOns of irtecjoureineht paicii letter invc^rj inr^i^ 
it is also of interest to study the differences in the structural parameters 
across gender. For instance, are the possibly higher level of th^i 
fjcfiievement construct for femciles due to the fact tiiat females have higher 
slopes on important regressor:: ( the important variable USEFUL would 
hov/ever be an important exception) ? Adding the restriction of invariant 
structural slo;;es, yieldi a ch\ -square difference of 29 v/itii 14 dc-rroes of 
freedom, while restricting only the slopes for PREARITH to be equal across 
sex yields a ch^square difference value of 2 with 1 degree of freedom. There 
^:(?ems to b • some evidence of difference's in some of the slopes, although 
PREARITH seems to have equal pr-edictive strength for the two sexes. 



6. Conclusior^s 



The MIMIC structural modeling approacFi was found to be quite useful with 
the present data where there was a particular interest in posttest responses 
and where pretest data was available. Using a single model framewor-k that 
(:xlends the boundaries of IRT, we were able to simultaneously deal not only 
with issues of measurement qualities, but also differential item performance 
in different subgroups and differential prediction of achievement. 
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UdL»r* V(^{ G^iorn; of the general rnoiJel ol ^-ct u;n 3 w-niid be relevcif^t in 
other situations. The external x variablc^^ need not only appear as backgroun:! 
variables, predicting the dichotomous y's. For instance, we may be 
intr>r£}sted in the differential predictive validity in different groups of a set 
of items or subtest scores for which certain constructs are hypothesized. 
Here, careful measurement modeling carried out on the exogenous side may 
ledd to better predictions of a certain y criterion. The use of structural 
moceling in such situations does not seern to have been fully explored. 
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TABLE 1 



L scription of External Variables 



PREALG 
PREHEAS 
PRLGEOM 
PREARITH 

FAED 



HOED 
MORED 



USEFUL 



ATTRACT 



Proportion of correct responses on seven pre-test core Items. 

Proportion of correct responses on seven prc^test core items. 

Proportion of correct responses on eight pre-test core Items. 

Estimated pre^test theta based m the three-parameter logistic 
model using 16 items. 

The highest type school attended by fatner or male guardian. 

1 = very little schooling, or no schooling at all 

2 = primary school 

3 = secondary school 

4 = college, university or some form of tertiary education 
As in FAEH. but for respondent's mother or female guardian. 

Responses to the question "After this year, hov; many more years 
of full-time (including university, college, etc.) education do 
you expect or plan to complete?" 

1 = none et all (0 years) 

2 = up to 2 years 

3 = more than 2 years - up to 5 years 

4 = more than 5 years - up to 8 years 

5 = more than 8 years 

Average score of four attitude items scored: Strongly 
disagree (1), Disagree (2), Undecided (3), Agree (4), and 
Strongly agree (5). These items are: 

1. I can get along well in everyday life without using 
mathematics (Reversed). 

2. A knowledge of mathematics is not necessary in most of 
occupations (Reversed). 

3. flathematics is not needed in every day living (Reversed). 

4. Most people do not use mathematics in their jobs (Reversed). 

Average scores of five attitude items. Scoring is as for 
USEFUL and the items are: 

1. I would like to work at a job that lets me use mathematics. 

2. I think mathematics is fun. 

3. Working with numbers makes me happy. 

4. I am looking forward to taking more mathematics. 

5. I refuse to spend a lot of my own time doing mathematics 
(Reversed). 
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TABLE 1 Cont'd. 

Description of External Vcriab^es 



Ethnicity dummy coding (0 



White)!: 



NONWHITE 



Class type dummy coding i 



Typical class): 



REMEDIAL 
ENRICHED 
ALGEBRA 



G/nder dummy coding (0 = Male): 



FEMALE 



father's occupation dummy coding (0 = Middle)^: 



LOWOCC 

HIGHOCC 

MISSOCC 



Motes* 



1. Th9 noti-v'hito category consists of Anerican Indian, Black', Chicago, 
Latin, Oriental, and Other. 

7.. The LOKOCC category of Father's occupation consists of the 
classifications Unskilled and Scnn' -ski lied worker, the Middle category 
consists of Skilled v.'orker, clerical, sales and related, the HIGHOCC 
category consists of Professional and Managerial, and the MISSOCC category 
consists of no response and unclassif iable response. 



ERIC 



P7 



TABLE 2 



Wording for Eight Po->Ltest A1(,ebra Core Itor.iS 



1. If 5x + 4 = 4x - 31, 
then X is equal to 



A 
B 
C 
D 
E 



-35 
-27 

3 
27 
35 



5. The air temperature at 
the foot of a mountain 
is 31 degrees. On top 
of the mountain the 
temperature is -^ 
degrees. How much 
vyarmer is the air at 
the foot of the 
mountain? 



If P = LW and if P = 12 

and L = 3, then W is equal to 



A 

B 
C 

D 
t 



3/4 
3 
4 

12 

3G 



3. (-2) X (-3) is equal to 



A 
D 
C 
D 
E 



-6 
-5 
-1 

5 
6 



A 
B 
C 
D 
E 



-38 degrees 
-24 degrees 
7 degrees 
24 degrees 
38 degrees 



4. If 4x/12 = 0, then x is equal tv 



G. A shopkee.;i?r has x kg 
of tea in stock. He 
sells 15 kg and then 
receives a new lot 
weighing 2y kg. What 
weight of tea does he 
now have? 

A X - 15 - 2y 

B X + 15 + 2y 

C X - 15 + 2y 

D X + 15 - 2y 

E None of these 



A 
B 
C 
D 
E 



0 
3 
8 
12 
16 
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TABLE 2 Cont'd- 
Wording for Eight Posttest Algebra Core Iteiiis 



?• The table below compares 
the height from v/hich a 
ball is dropped (d) and 
the height to which it 
bounces (b). 



d 


50 


80 


ICO 


150 


b 


25 


40 


50 


75 



Which fornula describes 
this relationship? 

A b = d2 

B b = 2d 

C b - d/2 

D b = d + 25 

E b = d - 25 



C. The sentence "a niiii'ber x 
decreased by C is less than 
12" can be written as the 
inequality 

A X - 6 > 12 

B X - 6 >_ 12 

C X - 6 < 12 

D 6 - X >_ 12 

E 6 - X < 12 



ERIC 



29 



TABLE 3 



Descriptive Statistics for the Different SIMS Samples 



Pre-tost Sample Post-test Sample Analysis Sample 
iLl 6517) (N = 7248) (N = 4320) 





Mean 


c w 
o.U. 


N 


Mean 


S.D. 


N 


Mean 


S.D. 


N 


PREALG 


0.40 


0.25 


^ o r" o 

6353 






___ 


0.43 


0.26 


4320 


PRli lEAS 


0. 49 


0.25 


^ o r o 

6353 








0.51 


0.24 


4320 


PREGEUH 


0. 00 


n o o 

0.23 


/TOO 

6353 








0.35 


0.23 


4320 


PREARITH 
(OBS SCORE) 


0. 39 


0.?3 


6353 








fi 

U. OL. 




/ • n 


PREARITH 
(THETA SCORE) 











— 








0.40 


0.18 


4320 


FAED 





— 


— 


0.80 


0.24 


6831 


0.82 


0.23 


4320 


MOED 


— 





— 


0.79 


0.22 


6879 


0.80 


0.21 


4320 


r.ORED 











0.75 


0.20 


6931 


0.77 


0.19 


4320 


USEFUL 





— 





0.71 


0.19 


6878 


0.72 


0.19 


4320 


ATTRACT 











0.54 


0.20 


6856 


0.54 


0.20 


4320 


f:OI".-|!ITE 


— 


— 





0.2C 


0.44 




0.22 


0.41 


4320 


REMEDIAL 








— 


0.08 


0.27 


7248 


0.07 


0.25 


4320 


ENRICHED 








— 


0.22 


0.41 


7248 


o.2r 


0.45 


4320 


ALGEBRA 





— 





0.13 


0.34 


7248 


0.13 


0.33 


4320 


FEMALE 


— 








0.52 


0.50 


7024 


0.53 


0.50 


4320 


LCl.'GCC 


- — 








0.18 


0.38 


7248 


0.18 


0.39 


4320 


HIGUOCC 








0.11 


0.32 


7248 


0.13 


0.33 


4320 


IlISSGCC 








0.42 


0.49 


7248 


0.39 


0.49 


4320 


POSTALGl 








0.21 


0.41 


7013 


0.22 


0.41 


4320 


P0STALG2 








0.6S 


0.46 


7013 


0.72 


0.45 


4320 


P0i.lALG3 








0.57 


0.50 


7013 


0.58 


0.49 


4320 


PC .TALG4 








0.49 


0.50 


7013 


0,51 


0.50 


4320 


P0STALG5 








0.45 


0.50 


7013 


0.47 


0.50 


''^20 


P0STALG6 








0.55 


0.50 


7013 


0.57 


0.49 


4320 


P0STALG7 








0.39 


0.49 


7013 


0.40 


0.49 


4320 


P0S1ALG8 








0.56 


0.50 


7013 


0.59 


0.49 


4320 


ALG OTLg 








0.71 


0.26 


6914 


0.72 


0.26 


4224 
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TABLC 



Proportion Correct and Opportunity to Learn (OTL) 
Proportions for the Eight Post Test Algebra Core Items 

by Class Type 



Item 





1 


2 


3 


4 


5 


6 


7 


8 


Class Type 






Proportion Correct 








RCf'iEDlAi 


0.09 


0.44 


0.14 


0.22 


0.14 


0.30 


0.2? 


0.31 


TYPICAL 


0.14 


0.67 


0.50 


0.43 


0.42 


0.52 


0.36 


0.53 


ENRICHED 


0.22 


0.81 


0.73 


C.63 


0.55 


0.G3 


0.46 


0.68 


ALGEDRA 


0.65 


0.90 


0.90 


0.81 


0.71 


0.85 


0.58 


0.84 


TOTAL 


0.22 


0.72 


0.58 


0.51 


0.47 


0.57 


0.40 


0.59 








OTL 


Proper ti CP 








REIIEDIAL 


0.21 


0.61 


0.43 


0.41 


0.65 


0.09 


0.16 


0.20 


TYPICAL 


0.50 


0.85 


0.97 


0.76 


0.S3 


0.40 


0.38 


0.64 


ENRICHED 


0.7C 


0.96 


0.94 


0.94 


0.95 


0.47 


0.58 


0.83 


ALCCDRA 


0.95 


0.95 


1.00 


0.95 


1.00 


0.35 


0.81 


1.00 


TOTAL 


0.61 


0.87 


0.93 


0.80 


0.92 


0.46 


0.47 


0.70 








Sample Size 












REMEDIAL 


TYPICAL 


ENRICHED 


ALGEBRA 


TOTAL 






299 


2^17 




1061 




543 


4320 
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TABLE 5 



Modification Indices for a Structural Model, 
All Students. Model I (N = 4,320) 



Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Item 8 



Direct Relationships Between Items and Regressors 



PREALG 2 


-1 


-1 


2 


0 


-1 


0 


-1 


PREMEAS -1 


2 


-2 


-1 


4 


-2 


1 


-2 


PREGEOM 1 


0 


-1 


-3 


1 


-1 


0 


2 


PREARITH -1 


1 


-1 


-1 


3 


-1 


0 


-1 


FAEU -3 


0 


2 


1 


3 


-3 


-1 


0 


MOLD -1 


0 


1 


1 


3 


0 


-1 


-3 


MORLD 0 


-1 


1 


0 


-1 


1 


0 


-2 


USEFUL -1 


1 


0 


0 


-2 


1 


-1 


1 


ATTRACT 3 


1 


-1 


0 


-2 


1 


0 


-1 


NONWHITE 3 


-4 


1 


5 


-8 


3 


2 


-1 


REMEDIAL 3 


1 


-5 


0 


-1 


2 


2 


0 


EllRICHED -9 


3 


8 


5 


-4 


-7 


-1 


2 


Al.GCP.riA 17* 


-4 


1 


0 


-6 


-2 


-4 


0 


rEt;ALE 0 


5 


3 


4 


-19* 


0 


-5 


10 


LOk.'OCC 1 


0 


1 


-1 


-2 


1 


0 


1 


IIICHOCC -1 


1 


-1 


2 


-2 


-1 


3 


0 


lilSSOCC 1 


-2 


-1 


3 


1 


0 


-2 


-1 


MO!;i; X REl: 2 


0 


-3 


1 


0 


0 


0 


0 


i:o:;h x eiuj o 


-1 


0 


2 


-3 


0 


1 


2 


KOfJl/ X ALG 2 


-1 


0 


1 


0 


0 


0 


-1 


PPEAi'.ITJi X KEM 1 


0 


-1 


0 


C 


0 


0 


0 


PREARITH X EMR -4 


1 


3 


2 


0 


-3 


0 


0 


PREARITH X ALG 9 


-3 


0 


0 


-3 


-1 


-2 


0 


KOIjW X PREARITH 1 


-1 


1 


2 


-2 


1 


0 


0 



Iter, 2 



feasurernent Error Correlations 



Iter" 3 


o 

L. 


-1 










Item 4 


- 3 


0 


7 








Iter.. 5 


- 6 


-12 


7 


-11 






Item 6 


- 4 


10 


-4 


4 


5 




Item 7 


- 1 


2 


-6 


-5 


13 


2 


Item 8 


- 4 


-3 


-8 


2 


-7 


21* 



Q * Freed paran:eter In Model II. 
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TABLE 6 



Parameter Estimates for a Structural Kodel. 
All Students. Model II (N = 4,320) 







Measurement 


Parameter 


Estima tes 




Response 


Thresholds 


Loadings 




Item 


Est. 


Est./S.E. 


Est. 


Lst./S.E. 


Reliabil 


Item 1 


2.19 


27 


0.54 


16 


0.19 


Item 2 


1.23 


14 


0.88 


22 


0.41 


Item 3 


1.91 


20 


i.ool 




0.49 


Item 4 


1.76 


20 


o.e? 


23 


0.37 


Item 5 


1.85 


20 


0.89 


23 


0.42 


Item 6 


1.59 


19 


0.82 


22 


0.37 


Item 7 


1.57 


21 


0.59 


19 


0.22 


Item 8 


1.34 


17 


0.73 


21 


0.32 



Error correlation for 

It^rrs 6 and 8 

0.12 5 



-Parari.eter is fixed to set the r^?tric of the latent variable construct. 
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TABLE 6 Cont'd. 



Structural Parameters v/ith the 
Latent Construct as Dependent Van'ahle 

Regressor Estimate Estimate/S.E. 



PKLhLu 


n CO 
U. oo 


11 


Pf<EfiEAS 


0.45 


7 


rKLULUfl 


U. 


b 


PREARITH 


2.09 


16 


FAED 


0.07 


1 


I'lUr.u 


n ns> 

U . Vc 


u 




n IP 


0 




n AC, 


7 


A TTD ATT 


n r\A 
u. uh 


1 
1 


l>UM.nl 1 L 




U 


KLriu U IML 


n 07 


1 
1 


EKKICiltD 


0.22 


3 




0.56 


4 


fei;ale 


0.14 


6 


LO'.'OCC 


0.02 


1 


liif.rocc 


0.12 


■5 
O 


tilSSOCC 


0.05 


2 


i;oi;'.i X RF.!i 


0.10 


1 
1 


tiom; X EfiR 


0.19 


3 


i.c:;;; x alg 


-0.18 




PRIARITH X REM 


-1.45 


-3 


l^r.CARITH X ENR 


-0.10 


-1 


PRF.ARITH X ALG 


-0.54 


-2 


NONW X PREARITl, 


-0.19 


-I 


Item - Regressor Relations noc Mediated by 


Latent Construct 


Itc"^ 1 on ALGEBRA 


0.86 


13 


Itcf.i 5 on FEMALL 


-0.35 


-8 


Latent Construct 






Residual Variance 


o.?o 


13 
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TABLE 7 



Means and Standard Deviations for Males and Females 
in Typical Classes 



Rogressors 






r ema i e v n 


- 1 9fi7 » 

- l,Lt>l) 


Mean 




Mean 


c n 


PREALG 


U. oo 


0.23 


0. 37 


0.23 


ff\n r"nr" n c 

PREMEAS 


U. bU 


0.23 


0.45 


0.23 




U. oo 


n oo 
\J.LL 


0. 29 


0. 19 


PKLAKl 1 n 


U. o/ 


U. 1 / 


0. 36 


0. 15 


FAED 


0.81 


0.?5 


0.79 


0.23 


MOED 


0.80 


0.20 


0.78 


0.21 


MORED 


C.74 


0.20 


0.74 


0.19 


USEFUL 


0.69 


0.19 


0.73 


0.17 


ATTRACT 


0.52 


0.20 


0.54 


0.20 


KOHWHITE 


0.21 


0.41 


0.23 


0.42 


LOWOCC 


0.21 


0.41 


0.20 


0.40 


HIGiiOCC 


0.11 


0.31 


0.11 


0.31 




0.37 


C.48 


0.40 


0.49 


miVi X PREARITh 


0.06 


0.13 


0.07 


0.14 
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Figure 1. 
A MIMIC Structural Probit Model 



36 









TABLE 8 














Parameter Estimates for 


a Simultaneous Structural 


Model 










Analysis of Males and Females 


in Typical Classes 










Measurement 


Parameter 


Estimates 










{Thresholds and loadings invariant over 


gender. 


except 


for item 5) 






Response 


Thresholds 


Loadings 




Reliabilities 






Item 


Est. Est./S.E. 


Est. 


Est./S.E. Males 


Females 






Item 1 


2.16 18.79 


0.55 


10.07 




0 13 


0.11 






Item 2 


1.50 9.56 


1.09 


14.66 




0. 39 


0.36 






Item 3 


1.83 12.61 


1.002 






0.35 


0.31 






Item 4 


1.83 13.54 


0.90 


14.25 




0.29 


0.26 






Item 5 










0.40 


0.30 






Hales 
Females 


2. Go 11.91 
2.13 11.72 


1.10 
0.97 


12.51 
11.69 












Item 6 


1.74 12.59 


0.98 


14.43 




0.33 


0.30 






Item 7 


1.71 K.45 


0.72 


12.22 




0.20 


0.13 






Item 8 


1.52 11.65 


0.88 


14.00 




0.28 


0.26 








Structural 


Parameter 


Est! nates 












Males (li = 


1,150) 




Ferudles 


(li = 


1,267) 






Regressors 


Est. 


Ist./S.E. 




Est. 


Est./^E. 






PREALG 


0.46 


9 




0.61 




7 






PRLIiEAS 


0.51 


5 




0.4G 




5 






PRLGECM 


0.43 


4 




0.23 




2 






PREARITH 


1.67 


9 




2.01 




10 






FALL' 


-0.12 


-1 




0.14 




2 






flOED 


0.19 


2 




0.00 




0 






flORFD 


0.14 


1 




0.20 




2 






USEFUL 


0.62 


6 




0.34 




3 






ATTRACT 


-0.01 


0 




0.11 




1 






KONWHITE 


0.10 


1 




0.02 




0 






LOl.'OCC 


0.02 


0 




-0.03 










KIGHOCC 


0.12 


2 




0.06 










r-iissocc 


0.12 


3 




-0.07 










fiCIiW X PREARITH -0.76 


-3 




-0.17 










Latent Construct 

Intercept O.OOl 






0.12 










Latent Construct 

Residual 0.15 


7 




0.13 










^Fixed parameter 
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